__nbid__ = '0036'
__author__ = 'Alice Jacques <alice.jacques@noirlab.edu>, Astro Data Lab Team <datalab@noirlab.edu>'
__version__ = '20240607' #yyyymmdd
__datasets__ = ['ls_dr9', 'sdss_dr17', 'des_dr1', 'smash_dr2', 'unwise_dr1', 'allwise', 'nsc_dr2']
__keywords__ = ['crossmatch', 'image cutout']
How to use the pre-crossmatched tables at Astro Data LabΒΆ
by Alice Jacques and the Astro Data Lab Team
GoalsΒΆ
- Understand the structure of the pre-crossmatched tables at Data Lab
- Learn how to access the pre-crossmatched tables at Data Lab
- Compare the retrieval speeds between an on-the-fly crossmatch table and a pre-crossmatched table
For examples using the pre-crossmatched tables hosted at Astro Data Lab, see our Examples using the pre-crossmatched tables notebook.
SummaryΒΆ
Crossmatch table naming templateΒΆ
The crossmatch tables at Astro Data Lab are named as follows:
schema1.xNpN__table1__schema2__table2
where the N in NpN encode the numerical value of the crossmatching radius (since dots '.' are not allowed in table names).
Example:
ls_dr9.x1p5__tractor__nsc_dr2__object
is a crossmatch table (indicated by the leading x after the dot '.'), located in the ls_dr9 schema, and it crossmatches the ls_dr9.tractor table with the nsc_dr2.object table (which lives in the nsc_dr2 schema) within a 1.5 arcseconds radius ('1p5') .
This is admittedly long, but clean, consistent, and most importantly, parsable. The use of double-underscores '__' is to distinguish from single underscores often used in schema and table names.
Columns in crossmatch tablesΒΆ
All pre-crossmatched tables have only these seven columns: id1, ra1, dec1, id2, ra2, dec2, distance. Column descriptions in the pre-crossmatched table contain the original column names in parentheses (this makes them parsable).
For example:
ls_dr9.x1p5__tractor__nsc_dr2__object
| Column | Description | Datatype |
|---|---|---|
| id1 | ID in left/first table (ls_id) | BIGINT |
| ra1 | Right ascension in left/first table (ra) | DOUBLE |
| dec1 | Declination in left/first table (dec) | DOUBLE |
| id2 | ID in right/second table (id) | VARCHAR |
| ra2 | Right ascension in right/second table (ra) | DOUBLE |
| dec2 | Declination in right/second table (dec) | DOUBLE |
| distance | Distance between ra1,dec1 and ra2,dec2 (arcsec) | DOUBLE |
Datatypes in crossmatch tablesΒΆ
- The column data types in a pre-crossmatched table for columns id1 and id2 are retained from the mother tables. The example above, BIGINT, is valid in many cases, but need not be for all data sets.
- The data types for columns ra1, dec1, ra2, dec2 are DOUBLE.
- The column distance can be either REAL or DOUBLE.
OverviewΒΆ
- The following 5 data sets are considered the main reference tables and are crossmatched against all data sets (if there is sky overlap) and when a new data set is ingested:
- latest gaia_drN.gaia_source
- latest nsc_drN.object
- latest unwise_drN.object
- allwise.source
- latest sdss_drN.specobj
- "Crossmatch" means for now "single nearest neighbor" (and this is the current mode at Data Lab).
- Object tables only, not single epoch measurements or metadata tables.
- For every crossmatch table with table1 as the left/first table and table2 as the right/second table, there exists a corresponding crossmatch table with table2 as the left/first table and table1 as the right/second table.
- For example,
allwise.x1p5__source__des_dr2__mainanddes_dr2.x1p5__main__allwise__source.
- For example,
The list of available crossmatch tables can be viewed in our table schema browser under their respective schema.
Disclaimer & attributionΒΆ
DisclaimersΒΆ
Note that using the Astro Data Lab constitutes your agreement with our minimal Disclaimers.
AcknowledgmentsΒΆ
If you use Astro Data Lab in your published research, please include the text in your paper's Acknowledgments section:
This research uses services or data provided by the Astro Data Lab, which is part of the Community Science and Data Center (CSDC) Program of NSF NOIRLab. NOIRLab is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under a cooperative agreement with the U.S. National Science Foundation.
If you use SPARCL jointly with the Astro Data Lab platform (via JupyterLab, command-line, or web interface) in your published research, please include this text below in your paper's Acknowledgments section:
This research uses services or data provided by the SPectra Analysis and Retrievable Catalog Lab (SPARCL) and the Astro Data Lab, which are both part of the Community Science and Data Center (CSDC) Program of NSF NOIRLab. NOIRLab is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under a cooperative agreement with the U.S. National Science Foundation.
In either case please cite the following papers:
Data Lab concept paper: Fitzpatrick et al., "The NOAO Data Laboratory: a conceptual overview", SPIE, 9149, 2014, https://doi.org/10.1117/12.2057445
Astro Data Lab overview: Nikutta et al., "Data Lab - A Community Science Platform", Astronomy and Computing, 33, 2020, https://doi.org/10.1016/j.ascom.2020.100411
If you are referring to the Data Lab JupyterLab / Jupyter Notebooks, cite:
- Juneau et al., "Jupyter-Enabled Astrophysical Analysis Using Data-Proximate Computing Platforms", CiSE, 23, 15, 2021, https://doi.org/10.1109/MCSE.2021.3057097
If publishing in a AAS journal, also add the keyword: \facility{Astro Data Lab}
And if you are using SPARCL, please also add \software{SPARCL} and cite:
- Juneau et al., "SPARCL: SPectra Analysis and Retrievable Catalog Lab", Conference Proceedings for ADASS XXXIII, 2024 https://doi.org/10.48550/arXiv.2401.05576
The NOIRLab Library maintains lists of proper acknowledgments to use when publishing papers using the Lab's facilities, data, or services.
Imports and setupΒΆ
# std lib
from getpass import getpass
# 3rd party
from astropy.utils.data import download_file #import file from URL
from matplotlib.ticker import NullFormatter
import pylab as plt
import matplotlib
%matplotlib inline
# Data Lab
from dl import authClient as ac, queryClient as qc, storeClient as sc
from dl.helpers.utils import convert # converts table to Pandas dataframe object
AuthenticationΒΆ
Much of the functionality of Data Lab can be accessed without explicitly logging in (the service then uses an anonymous login). But some capacities, for instance saving the results of your queries to your virtual storage space, require a login (i.e. you will need a registered user account).
If you need to log in to Data Lab, un-comment the cell below and execute it:
#token = ac.login(input("Enter user name: (+ENTER) "),getpass("Enter password: (+ENTER) "))
ac.whoAmI()
'demo00'
Accessing the pre-crossmatched tablesΒΆ
We can use Data Lab's Query Client to access the pre-crossmatched tables hosted by Data Lab. First let's get a total count of the number of objects (nrows) in SDSS DR17 that are also in LS DR9:
query="SELECT nrows FROM tbl_stat WHERE schema='sdss_dr17' and tbl_name='x1p5__specobj__ls_dr9__tractor'"
# Call query manager
response = qc.query(sql=query)
print(response)
nrows 4559818
Now let's print just the first 100 rows:
query = "SELECT * FROM sdss_dr17.x1p5__specobj__ls_dr9__tractor LIMIT 100"
result = qc.query(sql=query,fmt='pandas')
result
| id1 | ra1 | dec1 | id2 | ra2 | dec2 | distance | |
|---|---|---|---|---|---|---|---|
| 0 | 3384465917919389696 | 287.22826 | 48.064735 | 9907737095837650 | 287.228165 | 48.064735 | 0.227509 |
| 1 | 3384466192797296640 | 287.44889 | 48.229698 | 9907737159009377 | 287.448870 | 48.229697 | 0.049226 |
| 2 | 3384462344506599424 | 287.38750 | 48.168965 | 9907737158950166 | 287.387517 | 48.168933 | 0.121452 |
| 3 | 3384463718896134144 | 287.69779 | 48.382804 | 9907737221862367 | 287.697861 | 48.382752 | 0.252610 |
| 4 | 3384465093285668864 | 287.54718 | 48.407654 | 9907737221859120 | 287.547174 | 48.407548 | 0.381269 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 3384469766210086912 | 287.60106 | 48.844872 | 9907737284317775 | 287.601034 | 48.844932 | 0.223774 |
| 96 | 3384471690355435520 | 287.70990 | 48.888661 | 9907737346511898 | 287.709937 | 48.888637 | 0.123855 |
| 97 | 3384469491332179968 | 287.66389 | 48.944252 | 9907737346510986 | 287.663800 | 48.944491 | 0.887495 |
| 98 | 3384480486448457728 | 287.22115 | 48.827232 | 9907737284252188 | 287.221105 | 48.827183 | 0.206944 |
| 99 | 3384477737669388288 | 287.29420 | 48.927487 | 9907737346445470 | 287.294186 | 48.927487 | 0.032387 |
100 rows Γ 7 columns
Speed testΒΆ
First exampleΒΆ
Here we compare the speed of using the q3c_join() function to crossmatch directly in a query (query1) versus using a pre-crossmatched table and a JOIN statement (query2). We retrieve the same specified columns and the same random rows for the two queries. We will see that query2 retrieves results faster than query1 can retrieve results.
First, running the crossmatch ourselves:ΒΆ
query1 = """
SELECT
a.source_id AS id1, a.ra AS ra1, a.dec AS dec1,
gg.specobjid AS id2, gg.ra AS ra2, gg.dec AS dec2,
(q3c_dist(a.ra, a.dec, gg.ra, gg.dec)*3600) AS distance
FROM
allwise.source AS a
INNER JOIN LATERAL (
SELECT
s.specobjid, s.ra, s.dec
FROM
sdss_dr17.specobj AS s
WHERE
q3c_join(a.ra, a.dec, s.ra, s.dec, 1.5/3600.0)
ORDER BY
random()
ASC LIMIT 1
) AS gg ON true
WHERE
a.random_id BETWEEN 10 and 10.01
"""
df1 = qc.query(sql=query1,fmt='pandas',timeout=600)
df1 = df1.sort_values('id1')
Now, the same but using pre-crossmatched tables:ΒΆ
query2 = """
SELECT
X.id1, X.id2, X.ra1, X.dec1, X.ra2, X.dec2, X.distance
FROM
allwise.x1p5__source__sdss_dr17__specobj AS X
JOIN
allwise.source AS a ON X.id1 = a.source_id
WHERE
a.random_id BETWEEN 10 and 10.01
"""
df2 = qc.query(sql=query2,fmt='pandas')
df2 = df2.sort_values('id1')
Distance histograms for first exampleΒΆ
Here we plot the two distance histograms to demonstrate that the results obtained by both the JOIN and the q3c crossmatch queries are identical.
plt.hist(df1['distance'],histtype='step',color='b',lw=3,ls='-',alpha=0.5,label='Crossmatching manually')
plt.hist(df2['distance'],histtype='step',color='r',lw=4,ls=':',alpha=0.5,label='Using pre-crossmatched table')
plt.xlabel('angular distance (arcsec)')
plt.ylabel('number of sources')
plt.legend(loc='upper right',frameon=False)
plt.show()
Second example: we use a different catalog, and switch the order of queriesΒΆ
We again select objects from two catalogs and retrieve the same specified columns and the same random rows for two queries. query3 uses a pre-crossmatched table and a JOIN query while query4 crossmatches directly in the query using the q3c_join() function. We will see that query3 retrieves results faster than query4 can retrieve results.
First, using pre-crossmatched tables:ΒΆ
query3 = """
SELECT
X.id1, X.id2, X.ra1, X.dec1, X.ra2, X.dec2, X.distance
FROM
unwise_dr1.x1p5__object__sdss_dr17__specobj AS X
JOIN
unwise_dr1.object AS u ON X.id1 = u.unwise_objid
WHERE
u.random_id BETWEEN 10 and 10.01
"""
df3 = qc.query(sql=query3,fmt='pandas',timeout=600)
df3 = df3.sort_values('id1')
Now, running the crossmatch ourselves:ΒΆ
query4 = """
SELECT
u.unwise_objid AS id1, u.ra AS ra1 ,u.dec AS dec1,
ss.specobjid AS id2, ss.ra AS ra2, ss.dec AS dec2,
(q3c_dist(u.ra, u.dec, ss.ra, ss.dec)*3600.0) AS distance
FROM
unwise_dr1.object AS u
INNER JOIN LATERAL (
SELECT
s.specobjid, s.ra, s.dec
FROM
sdss_dr17.specobj AS s
WHERE
q3c_join(u.ra, u.dec, s.ra, s.dec, 1.5/3600.0)
ORDER BY
random()
ASC LIMIT 1
) as ss ON true
WHERE
u.random_id BETWEEN 10 and 10.01
"""
df4 = qc.query(sql=query4,fmt='pandas',timeout=600)
df4 = df4.sort_values('id1')
Distance histograms for second exampleΒΆ
Here we plot the two distance histograms to demonstrate that the results obtained by both the JOIN and the q3c crossmatch queries are identical.
plt.hist(df4['distance'],histtype='step',color='b',lw=3,ls='-',alpha=0.5,label='Crossmatching manually')
plt.hist(df3['distance'],histtype='step',color='r',lw=4,ls=':',alpha=0.5,label='Using pre-crossmatched table')
plt.xlabel('angular distance (arcsec)')
plt.ylabel('number of sources')
plt.legend(loc='upper right',frameon=False)
plt.show()
AppendixΒΆ
A clear benefit of pre-crossmatched tables is that they contain the positions of the same objects in two datasets. We can use this to e.g. fetch images of an object from both surveys.
A1. unWISE DR1 vs LS DR9ΒΆ
Here we will compare two images of the same object from two different catalogs, unWISE DR1 and LS DR9.
Function to retrieve cutoutsΒΆ
def make_cutout_comparison_table(ra_in1, dec_in1, layer1, layer2, pixscale, ra_in2=None, dec_in2=None):
"""
Obtain color JPEG images from Legacy Survey team cutout tool at NERSC
"""
img1 = []
img2 = []
for i in range(len(ra_in1)):
cutout_url1 = "https://www.legacysurvey.org/viewer/cutout.jpg?ra=%g&dec=%g&layer=%s&pixscale=%s" % (ra_in1[i],dec_in1[i],layer1,pixscale)
img = plt.imread(download_file(cutout_url1,cache=True,show_progress=False,timeout=120))
img1.append(img)
cutout_url2 = "https://www.legacysurvey.org/viewer/cutout.jpg?ra=%g&dec=%g&layer=%s&pixscale=%s" % (ra_in2[i],dec_in2[i],layer2,pixscale)
img = plt.imread(download_file(cutout_url2,cache=True,show_progress=False,timeout=120))
img2.append(img)
return img1,img2
Function to generate plotsΒΆ
def plot_cutouts(img1,img2,cat1,cat2):
"""
Plot images in two rows with 5 images in each row
"""
fig = plt.figure(figsize=(21,7))
for i in range(len(img1)):
ax = fig.add_subplot(2,6,i+1)
ax.imshow(img1[i])
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
ax.tick_params(axis='both',which='both',length=0)
ax.text(0.02,0.93,'ra=%.5f'%list_ra1[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.85,'dec=%.5f'%list_dec1[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.77,cat1,transform=ax.transAxes,fontsize=12,color='white')
ax = fig.add_subplot(2,6,i+7)
ax.imshow(img2[i])
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
ax.tick_params(axis='both',which='both',length=0)
ax.text(0.02,0.93,'ra=%.5f'%list_ra2[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.85,'dec=%.5f'%list_dec2[i],transform=ax.transAxes,fontsize=12,color='white')
ax.text(0.02,0.77,cat2,transform=ax.transAxes,fontsize=12,color='white')
plt.subplots_adjust(wspace=0.02, hspace=0.03)
Write query to randomly select five targets (RA/Dec positions) from unWISE DR1 and LS DR9 crossmatch tableΒΆ
... then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
q = """
SELECT
ra1, dec1, ra2, dec2
FROM
unwise_dr1.x1p5__object__ls_dr9__tractor
WHERE
ra1>300 and dec1>33
ORDER BY
random()
LIMIT 5
"""
r = qc.query(sql=q,fmt='pandas')
list_ra1=r['ra1'].values # ".values" convert to numpy array
list_dec1=r['dec1'].values
list_ra2=r['ra2'].values
list_dec2=r['dec2'].values
cat1='unWISE'
cat2='ls dr9'
layer1='unwise-neo6'
layer2='ls-dr9'
pixscale='0.3'
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,list_ra2,list_dec2)
plot_cutouts(img1,img2,cat1,cat2)
A2. SDSS vs DES DR1ΒΆ
Here we will compare two images of the same object from two different catalogs, SDSS and DES DR1.
Write query to randomly select five targets (RA/Dec positions) from SDSS DR17 and DES DR1 crossmatch tableΒΆ
... then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
q = """
SELECT
ra1, dec1, ra2, dec2
FROM
sdss_dr17.x1p5__specobj__des_dr1__main
ORDER BY
random()
LIMIT 5
"""
r = qc.query(sql=q,fmt='pandas')
list_ra1=r['ra1'].values # ".values" convert to numpy array
list_dec1=r['dec1'].values
list_ra2=r['ra2'].values
list_dec2=r['dec2'].values
cat1='sdss'
cat2='des dr1'
layer1='sdss'
layer2='des-dr1'
pixscale='0.25'
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,list_ra2,list_dec2)
plot_cutouts(img1,img2,cat1,cat2)
A3. Cool galaxy finds: SDSS vs DES DR1ΒΆ
We compare two images of the same galaxy from two different catalogs, SDSS and DES DR1. We use a list of identified galaxies (RA/Dec positions) to compare the difference in observable features and quality between the two catalogs.
First we import the CSV file of identified galaxies (RA/Dec positions) into MyDB:
qc.mydb_import('gals','./gals.csv',drop=True)
'OK'
We write the query to select the first five RA/Dec positions from our table. We then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
qg = "SELECT ra, dec FROM mydb://gals LIMIT 5"
rg = qc.query(sql=qg)
rp = convert(rg)
list_ra1=rp['ra'].values
list_dec1=rp['dec'].values
list_ra2=rp['ra'].values
list_dec2=rp['dec'].values
cat1='sdss dr17'
cat2='des dr1'
layer1='sdss'
layer2='des-dr1'
pixscale='0.5'
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,ra_in2=list_ra1,dec_in2=list_dec1)
plot_cutouts(img1,img2,cat1,cat2)
We write the next query to select the next five RA/Dec positions from our table. We then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
qg = "SELECT ra, dec FROM mydb://gals LIMIT 5 OFFSET 5"
rg = qc.query(sql=qg)
rp = convert(rg)
list_ra1=rp['ra'].values
list_dec1=rp['dec'].values
list_ra2=rp['ra'].values
list_dec2=rp['dec'].values
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,ra_in2=list_ra1,dec_in2=list_dec1)
plot_cutouts(img1,img2,cat1,cat2)
We write the next query to select the last five RA/Dec positions from our table. We then save them as arrays and set the captions, layers, and pixscale. Finally we plot the cutout images.
qg = "SELECT ra, dec FROM mydb://gals LIMIT 5 OFFSET 10"
rg = qc.query(sql=qg)
rp = convert(rg)
list_ra1=rp['ra'].values
list_dec1=rp['dec'].values
list_ra2=rp['ra'].values
list_dec2=rp['dec'].values
img1,img2 = make_cutout_comparison_table(list_ra1,list_dec1,layer1,layer2,
pixscale,ra_in2=list_ra1,dec_in2=list_dec1)
plot_cutouts(img1,img2,cat1,cat2)
Resources & referencesΒΆ
Legacy Survey Sky Browser: https://www.legacysurvey.org/viewer#NGC%203098